Read PDFs Aloud with AI Voice (Text-to-Speech)
Annolid includes an integrated PDF viewer with text-to-speech (TTS), so you can open a PDF and have selected text (or whole paragraphs) read aloud.
Prerequisites
Install Annolid (see
install.md).Install PDF support:
pip install pymupdf
Install a TTS backend (pick one):
Recommended (offline, higher-quality “AI voice”): Kokoro (ONNX)
pip install kokoro-onnx onnxruntime gdown
Pocket TTS (very lightweight, CPU-only runtime, voices such as
alba,marius,javert,jean,fantine,cosette,eponine, andazelma)
pip install pocket-tts
Set Engine = Pocket, choose one of the built-in voices, or type a custom voice ID / prompt path.
If you have a short WAV prompt of the desired voice, specify it in the “Pocket prompt” field to clone that tone.
Use the Pocket speed control to speed up or slow down the generated speech (0.5–2.0×).
(Optional) You can also install via pip install annolid[pocket_tts] so the dependency is available automatically.
Voice cloning (offline, uses a short voice prompt): Chatterbox Turbo (ONNX)
pip install onnxruntime soundfile
Then select Engine = Chatterbox and choose a voice prompt audio file in the PDF Speech dock (or edit ~/.annolid/tts_settings.json).
Language packs for Kokoro when you want Chinese or Japanese voices:
pip install misaki[zh] # enables Mandarin (e.g., voice zf_001)
pip install misaki[ja] # enables Japanese (e.g., voice jf_alpha)
Fallback (online, simpler): Google TTS
pip install gTTS pydub
pydub needs ffmpeg available on your system.
Open a PDF in Annolid
Launch the GUI:
annolid
Go to
File→Open PDF...and pick a.pdf.
Annolid switches into PDF view and shows these docks (typically on the right):
PDF Speech(voice / language / speed)PDF Controls(page + zoom)PDF Reader(click-to-read mode)
Option A: Speak a selection (fastest)
This works in both the fallback viewer (image + text panel) and the PDF.js viewer.
Select some text (either in the page text panel, or directly on the PDF page).
Right-click →
Speak selection.
Option B: Click-to-read paragraphs (PDF.js reader mode)
This reads full paragraphs/sentences starting from where you click.
In the
PDF Readerdock, enableUse PDF.js (required for reader).Keep
Enable click-to-readturned on.Click a paragraph in the PDF page to start reading.
Use
Pause/Resume,Stop,Prev,Nextin the same dock.
If the reader says it’s unavailable, install QtWebEngine (pyqtwebengine in conda, or PyQtWebEngine via pip) and restart Annolid.
Change voice, language, and speed
Use the PDF Speech dock to set:
Voice(example:af_sarah)Voice(Chinese):zf_001(requiresmisaki[zh])Voice(Japanese):jf_alpha(requiresmisaki[ja])Language(example:en-us)Speed(0.5–2.0)
These settings persist in ~/.annolid/tts_settings.json.
Troubleshooting
“PyMuPDF Required” dialog: run
pip install pymupdf.No audio output:
Make sure
ANNOLID_DISABLE_AUDIOis not set.On Linux servers/containers, ensure an audio device is present (or use a desktop machine).
First Kokoro run is slow: Annolid downloads model files into
~/.annolid/kokorothe first time.gTTS fails: it requires internet access; also ensure
ffmpegis installed forpydub.